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This  study  uses  statistical  techniques  to  evaluate  reports  on  suicide  scenes;  it  utilizes  80  reports  from 
different  locations  in  Brazil,  randomly  collected  from  both  federal  and  state  jurisdictions.  We  aimed  to 
assess  a  heterogeneous  group  of  cases  in  order  to  obtain  an  overall  perspective  of  the  problem.  We 
evaluated  variables  regarding  the  characteristics  of  the  crime  scene,  such  as  the  detected  traces  (blood, 
instruments  and  clothes)  that  were  found  and  we  addressed  the  methodology  employed  by  the  experts. 
A  qualitative  approach  using  basic  statistics  revealed  a  wide  distribution  as  to  how  the  issue  was 
addressed  in  the  documents.  We  examined  a  quantitative  approach  involving  an  empirical  equation  and 
we  used  multivariate  procedures  to  validate  the  quantitative  methodology  proposed  for  this  empirical 
equation.  The  methodology  successfully  identified  the  main  differences  in  the  information  presented  in 
the  reports,  showing  that  there  is  no  standardized  method  of  analyzing  evidences. 

©  2014  Elsevier  Ltd  and  Faculty  of  Forensic  and  Legal  Medicine.  All  rights  reserved. 


1.  Introduction 

The  initial  approach  to  the  crime  scene  is  crucial  to  case-solving. 
However,  the  resources  available  to  each  jurisdiction  vary.  In 
Brazil,  approaches  to  crime  scenes  differ  even  for  crimes  of  the 
same  type.  Because  desirable  uniformity  is  lacking,  a  series  of 
questions  regarding  the  procedures  for  crime-scene  analysis  and 
their  results  may  arise.  Analysis  of  material  evidence  requires 
greater  technical  precision  to  improve  the  investigative  process. 
Some  isolated  efforts  already  exist  with  respect  to  creating  regu¬ 
lations  to  standardize  the  procedures  to  be  adopted  at  crime 
scenes.  In  the  USA,  for  example,  the  National  Institute  of  justice 
provides  guidelines  for  crime  scene  investigation. 


Abbreviations:  RR,  Report  Relevance;  Wv,  Variable  Weight;  Fc,  Context  Factor; 
PCA,  Principal  Component  Analysis;  KNN,  K-th  Nearest  Neighbor;  SIMCA,  Soft  In¬ 
dependent  Modeling  of  Class  Analogies;  PLS,  Partial  Least  Squares;  LOO,  Leave  One 
Out;  LNO,  Leave  N-Out;  RMSEV,  Root  Mean  Square  Error  of  Validation;  RMSEC,  Root 
Mean  Square  Error  of  Calibration. 
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While  homicide  consists  of  killing  someone  else,  suicide  is  the 
act  of  deliberately  taking  your  own  life.  Many  reasons  may  lead  a 
person  to  commit  suicide,  including  mental  disorders  and  some 
physical  illnesses.  For  the  USA  National  Center  for  Injury  Preven¬ 
tion  and  Control,  suicidal  self-directed  violence  is  the  “ Behavior 
that  is  self-directed  and  deliberately  results  in  injury  or  the  potential 
for  injury  to  oneself  There  is  evidence,  whether  implicit  or  explicit,  of 
suicidal  intent Murdering  someone  and  committing  suicide  are 
extreme  acts  of  aggression  that  shock,  amaze,  and  affect  society  and 
the  closest  survivors,  as  well  as  the  nation's  economy.  For  justice 
and  investigation  purposes,  establishing  the  difference  between 
these  behaviors  is  essential  to  clarify  and  define  the  dynamics  in  a 
crime  scene.  International  or  intercultural  comparisons  of  suicide 
methods  help  to  gain  deeper  understanding  of  the  interplay  be¬ 
tween  these  two  factors,  and  provide  a  basis  for  preventive 
strategies. 

Although  the  legal  and  psychological  distinctions  between  ho¬ 
micide  and  suicide  seem  to  be  straightforward,  the  differential 
diagnosis  of  these  two  forms  of  violent  death  is  no  easy  task  for  the 
experts  during  the  analysis  of  a  crime  scene,  especially  in  cases  of 
suicide  simulation.  The  specialized  literature  contains  reports  on 
cases  in  which  it  is  difficult  to  ascertain  whether  the  action  is 
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homicide,  suicide,  or  accident.  Additionally  some  papers  have 
described  suspicious  and  simulated  suicide  ;  in  very  singular 
situations,  homicide  simulation  can  occur. 

Despite  the  complexity  of  the  analysis,  experts  may  reach  a 
satisfactory  conclusion  about  the  cause  of  death  if  they  examine  the 
crime  scene  thoroughly  and  identify  traces  generated  during  the 
violent  action  correctly.  The  individual  analysis  of  traces  and  their 
connection  are  key  to  establishing  crime  dynamics  and  the  crimi¬ 
nal's  mode  of  action. 

Advances  in  scientific  methodology  have  influenced  the  devel¬ 
opment  of  expertise  in  the  sense  of  avoiding  biased  interpretation. 
Scientists  have  improved  technical  tests  which  have  made  forensic 
investigation  more  reliable.  Scientific  methods,  specific  protocols, 
statistical  tools,  and  other  objective  criteria  are  important  in 
establishing  and  strengthening  forensic  work  as  a  science. 

When  a  criminal  offense  is  committed,  all  the  evidence  should 
be  assessed  jointly.  This  should  be  collected  and  evaluated  in  order 
to  determine  the  identity  of  the  criminal.  Forensic  investigation 
involves  applying  a  scientific  method  to  crime  investigation  and 
provides  vital,  objective  information  about  the  case.  Forensic  ex¬ 
amination  consists  of  the  following  phases:  recognition,  identifi¬ 
cation,  comparison,  individualization  and  interpretation  of  tests. 

Recent  advances  in  science  and  technology  have  provided 
forensic  scientists  with  a  vast  number  of  methods  and  techniques. 
When  experts  assess  the  physical  evidence,  they  gather  it  together 
and  quantify  the  contribution  of  a  particular  suspect  in  the  event. 
To  solve  this  problem,  many  experts  employ  statistical  tools  in  or¬ 
der  to  interpret  the  results.  Statistical  analysis  of  forensic  data  has 
acquired  growing  importance  in  courts.  Forensic  scientists  can  now 
evaluate  and  interpret  the  evidence  that  includes  elements  of  un¬ 
certainty.  The  literature  also  reports  cases  of  subjectivity  bias  is 
registered  on  fingerprint  and  DNA  analysis. 

This  study  aims  to  examine  expert  reports  of  crime  scenes  of 
suicide  by  using  statistical  tools  to  assess  the  gaps  and  weaknesses 
in  the  procedures  described  in  the  reports.  Our  overall  is  to  gain  an 
idea  of  the  dimension  of  the  problem  and  offer  some  positive 
feedback  to  official  expertise,  showing  the  need  to  design  a  stan¬ 
dardized  procedure  for  the  analysis  of  crime  scenes  related  to  vi¬ 
olent  deaths  in  Brazil. 

2.  Material  and  methods 

Eighty  reports  of  suicide  were  analyzed  after  being  randomly 
collected  from  different  jurisdictions  and  locations.  The  objective 
was  to  evaluate  a  heterogeneous  group  of  cases  to  formulate  an 
overview  of  the  analysis. 

The  first  step  was  to  determine  the  cause  of  death  in  each  case 
and  then  formulate  questions  about  the  methodology.  There  were 
19  variables,  associated  with  the  questions  listed  in  Table  1.  The 
possible  answers  were  YES,  NO  or  Impossible  to  Determine  (ID), 
which  were  attributed  values  1,  -1  and  0,  respectively.  A  NO  answer 
could  account  for  something  that  should  have  existed  and  consti¬ 
tutes  a  negative  factor  for  the  item.  Impossible  to  Determine,  refers 
to  situations  when  it  was  not  possible  to  identify  any  YES  or  NO 
answers  for  the  variable,  due  to  lack  of  information  in  the  report. 
For  example,  if  the  report  did  not  cite  clothes,  analysis  of  this  var¬ 
iable  was  impossible.  However,  this  does  not  mean  that  experts  did 
not  analyze  the  variable;  it  only  meant  that  the  information  did  not 
exist  in  the  report. 

From  these  variables,  the  overall  quality  of  each  report  was 
calculated  using  the  following  auxiliary  variables: 

Report  Relevance  (RR)  determines  how  representative  the 
report  was  in  terms  of  the  information  that  it  contains;  an  empirical 
equation  was  developed;  two  parameters  were  elaborated:  Vari¬ 
able  Weight  and  Context  Factor. 


Table  1 

Variables  studied  in  the  analysis. 

V01  Were  injuries  characterized  in  the  report? 

V02  Did  the  report  contain  details  about  these  injuries? 

V03  Was  the  violent  act  performed  by  means  of  an  instrument? 

V04  Was  the  instrument  collected? 

V05  Was  the  instrument  analyzed? 

V06  Did  the  report  describe  the  absence  of  typical  lesions  related  to  fighting 
or  defense? 

V07  Were  the  victim's  clothes  mentioned? 

V08  Were  the  victim's  clothes  analyzed? 

V09  Was  blood  at  the  scene  mentioned? 

V10  When  found,  were  the  bloodstains  analyzed? 

Vll  Was  the  body  position  described? 

V12  Was  the  body  position  related  to  the  dynamics  of  the  facts? 

V13  Did  the  report  present  a  dynamic  compatible  with  the  evidence  at  the 
crime  scene  that  could  rule  out  homicide  (suicide  simulation)? 

V14  In  addition  to  the  tests  performed  at  the  scene,  were  additional 
laboratory  tests  conducted? 

V15  Did  the  report  discuss  the  characteristics  of  the  scene? 

V16  Is  there  a  classification  regarding  the  characteristics  of  the  crime  scene? 
( e.g .  reputable  or  disreputable;  mediate,  immediate  or  related  etc.) 

V17  Was  the  evidence  of  violence  photographed? 

V18  Did  the  report  show  a  sketch  to  enable  better  understanding  of  the 
facts? 

V19  Did  the  report  use  appropriate  language  (clear,  objective,  and 
grammatically  correct)? 


Variable  Weight  (Vw)  is  intended  to  correct  distortions 
regarding  the  importance  of  each  variable,  associated  with  a  nu¬ 
merical  value  according  to  the  importance  of  the  information,  i.e., 
how  significant  the  specific  condition  is  for  the  report.  The  weights 
were  set  as  1  when  the  variable  was  considered  as  relevant,  2  when 
it  was  assigned  as  necessary  and  3  when  it  was  considered  to  be 
fundamental.  Table  2  lists  explanations  of  these  values  in  the  case 
of  each  variable. 

Context  factor  (Fc)  is  a  means  of  pondering  each  variable 
considering  the  context  of  the  criminal  action.  It  is  specific  to  each 
report  and  provides  a  more  sensitive  analysis,  because  the  situation 
can  affect  the  relevance  of  the  variables.  For  example:  the  analysis 
of  the  instrument  was  considered  to  be  fundamental,  but  if  the 
cause  of  the  death  was  human  fall,  no  further  analysis  was  neces¬ 
sary.  In  this  situation,  although  the  variable  is  important,  its 
absence  is  completely  acceptable.  The  same  applies  if  a  gun  was 
found  to  have  been  used  in  the  crime  scene,  but  the  cause  of  the 
death  was  hanging  and  no  bullet  wounds  are  found  on  the  body.  In 
this  context,  the  gun  analysis  is  relevant  but  not  necessarily  asso¬ 
ciated  with  the  case.  Fc  values  were  0  for  irrelevant,  which  means 
that  the  answer  does  not  apply  to  the  studied  case;  1  for  relevant;  2 
for  necessary  and  3  when  it  was  considered  fundamental. 

The  parameters  described  above  were  developed  to  provide  an 
empirical  equation  for  Report  Relevance,  given  by: 


op  YtjWv(i)Fc{i)Vq{i) 


where  Vq  is  the  variable  of  the  question  (sum  of  answers  to  the 
formulated  variables).  RR  ranges  from  0  to  1.  This  equation  seeks  to 
provide  a  quantitative  indication  of  the  amount  of  the  information 
accounted  for  in  each  report. 

In  order  to  test  if  RR  makes  sense,  it  was  validated  using  the 
following  multivariate  tools: 


a)  Pattern  recognition  was  used  to  identify  the  characteristics  of 
the  data  set  and  associate  similarities  among  the  data.  This  was 
achieved  by  observing  natural  clustering  (unsupervised 
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Table  2 

Variable  weight  and  reason  for  relevance.  The  values  are  associated  to:  (1 )  relevant, 
(2)  necessary,  and  (3)  fundamental. 


Variable 

Weight 

Reason  for  relevance 

V01 

2 

Injuries  provide  vestiges  of  the  event  dynamics. 

V02 

3 

These  details  provide  essential  vestiges  for  the 
investigation. 

V03 

1 

The  instrument  is  important,  but  its  absence  does  not 
jeopardize  the  report. 

V04 

2 

The  instrument  provides  ways  of  extracting  information 
that  can  be  decisive. 

V05 

3 

Instrument  analysis  is  a  direct  source  of  important 
information. 

V06 

3 

It  is  fundamental  to  disqualify  the  event  as  suicide.  The 
absence  of  this  description  may  invalidate  the 
credibility  of  the  entire  report. 

V07 

1 

This  information  is  required,  but  its  mention  does  not 
add  much  to  the  case. 

V08 

2 

The  evaluation  of  clothes  can  be  an  important  source  of 
evidences. 

V09 

1 

This  information  is  required,  but  its  mention  does  not 
add  much  to  the  case. 

V10 

3 

All  the  evidence  derived  from  blood  studies  depends  on 
this  analysis. 

VI 1 

2 

Description  of  the  body  position  helps  to  establish  the 
criminal  dynamics. 

V12 

3 

The  relationship  of  the  body's  position  and  the  criminal 
dynamics  is  hoghly  important  in  solving  the  case. 

V13 

3 

If  this  dynamic  is  not  determined,  the  report  may  be 
completely  mischaracterized. 

V14 

2 

The  complementary  exams  can  detect  vestiges  that  may 
be  necessary  for  case-solving. 

V15 

1 

Place  description  is  required. 

V16 

1 

This  information  can  be  helpful  in  interpret  the 
criminal's  dynamics. 

VI 7 

2 

Photographic  evidence  may  generate  new  evidence  or 
help  re-interpretation. 

V18 

2 

The  sketch  may  generate  new  evidence  or  help  re¬ 
interpretation. 

V19 

1 

Appropriate  language  and  spelling  are  required. 

learning)  or  by  using  classification  techniques  (supervised 
learning). 

b)  Partial  Least  Square  (PLS)  was  applied  to  data  considering  RR 
the  dependent  (or  projection)  variable.  The  main  objective  was 
to  identify  which  variables  were  more  important  in  the 
composition  of  RR  values. 


Table  3 

Mode  used  in  the  suicidal  action. 


A 

Hanging 

45 

B 

Injury  by  blunt-piercing  instrument 

16 

C 

Impact  due  to  fall  from  great  height 

8 

D 

Poisoning 

4 

E 

Not  clear  in  the  report 

4 

F 

Injury  by  cutting  and  piercing  instrument 

3 

G 

Drowning 

1 

2.1.  Pattern  recognition 

Exploratory  analysis  or  unsupervised  learning  can  help  to 
evaluate  natural  similarities.  The  main  goal  is  to  display  extensive 
and  complex  data  by  reducing  the  system's  dimensions,  which 
provides  a  better  understanding  of  the  structure  and  the  correla¬ 
tions  among  samples  and  the  variables  in  the  data  set.  This  paper 
uses  the  following  techniques: 

•  Principal  Component  Analysis  (PCA),  which  is  a  multivariate 
method  that  can  verify  similarities  among  samples  by  reducing 
the  system's  dimension. 

•  Hierarchical  Cluster  Analysis  (HCA),  which  is  a  multivariate 

method  for  unsupervised  learning.  Its  goal  is  to  display 

data  in  a  two-dimensional  space,  to  emphasize  their  natural 
clustering  and  patterns. 

•  Supervised  learning  methods  to  assign  samples  into  pre-defined 
classes  and  check  the  efficiency  of  this  classification.  This  class  of 
modeling  techniques  have  many  variations,  but  we  only  use  the 
K-th  Nearest  Neighbor  (KNN)  and  the  flexible  modeling 
method  known  as  SIMCA  (Soft  Independent  Modeling  of  Class 
Analogies). 

Partial  Least  Squares  is  a  regression  method  that  is  often 
employed  to  validate  models  and  is  particularly  important  in 
the  present  study.  In  our  case,  the  data  matrix  X  contains  samples 
(rows,  for  each  report)  and  variables  (columns,  the  questions)  and 
at  least  one  vector  RR  -  called  the  dependent  variable  -  featuring 
data  on  the  properties.  The  main  assumption  is  that  a  linear 
combination  of  the  information  matrix  X  can  describe  the 
dependent  variable  RR.  The  regression  vector  (3  indicates  which 


Fig.  1.  Scheme  for  reports  evaluation. 
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Table  4 

Total  YES,  NO  and  ID. 


V01 

V02 

V03 

V04 

V05 

V06 

V07 

V08 

V09 

V10 

YES 

66 

46 

69 

26 

40 

49 

68 

17 

27 

21 

NO 

9 

21 

4 

43 

29 

29 

10 

61 

53 

9 

ID 

5 

13 

7 

11 

11 

2 

2 

2 

0 

50 

Vll 

V12 

V13 

V14 

V15 

V16 

V17 

V18 

V19 

YES 

66 

48 

50 

19 

77 

35 

79 

16 

79 

NO 

13 

26 

27 

61 

3 

45 

1 

63 

1 

ID 

1 

6 

1 

0 

0 

0 

0 

1 

0 

descriptors  are  important  in  modeling  RR  (Equation  (3)).  The 
principal  components  are  “optimized”  to  better  describe  the  R 2 

relationship  between  the  matrix  X  and  the  vector  RR 
simultaneously. 


-yd)2  \ 

<ye>)2/ 


y  =  X/3  (2) 

Validation  is  necessary  to  ensure  the  robustness  and  predictive 
ability  of  the  model,  as  well  as  its  use  with  samples  that  were  not 
part  of  the  calibration  step.  In  this  case  we  performed  from  Leave 
One-Out  (LOO)  up  to  Leave  Seven-Out  cross-validations. 

To  strengthen  the  modeling,  some  useful  indicators  must  be 
evaluated: 


•  Root  Mean  Square  Error  of  Validation  (RMSEV),  given  by 


RMSEV 


( Yei  y vi)2 

\i  n 


•  Internal  correlation  coefficient  model  cross-validation  (Q2),  •  Calibration  (RMSEC),  given  by 

given  by 


-yvi)2  \ 

<ye  > )/ 


•  Correlation  coefficient  for  calibration  (R2),  given  by 


RMSEC 


( Tei  ~  yd)2 
\  i  n 


where  n  is  the  number  of  samples,  ye  the  original  data,  yc  the  value 
obtained  from  calibration  and  yv  the  value  calculated  during  cross- 
validation.  Correspondence  statistics  are  characterized  by  the 


Table  5 

Number  of  YES,  NO  and  ID  answers  for  the  individual  reports. 


NO 


ID 


YES 


ID 


YES 


12  11  10  10 


0 


R41 


13 


R42 


17 


R43 


13 


R44 


18 


R45 


14 


R46 


12 


R47 


10 


8  14  8  11  12  10  11  12 


0 


R48 


14 


R49 


14 


R50 


13 


R51 


12 


R52 


11 


R53 


11 


R54 


12 


R55 


11 


R56 


14 


0 


R57 


12 


R58 


12 


0 


R59 


13 


R61 


R62 


0 


R63 


0 


R64 


R65 


R66 


R67 


R68 


R69 


R70 


R71 


R72 


R73 


R74 


R75 


R76 


R77 


R78 


R79 


R01 

R02 

R03 

R04 

R05 

R06 

R07 

R08 

R09 

R10 

Rll 

R12 

R13 

R14 

R15 

R16 

R17 

R18 

R19 

R20 

YES 

16 

10 

16 

16 

9 

19 

16 

17 

19 

9 

10 

11 

4 

8 

10 

6 

7 

8 

7 

10 

NO 

3 

7 

3 

2 

6 

0 

3 

2 

7 

5 

13 

9 

8 

12 

11 

10 

11 

9 

ID 

0 

2 

0 

1 

4 

0 

0 

0 

0 

0 

2 

3 

2 

2 

1 

1 

1 

1 

1 

0 

R21 

R22 

R23 

R24 

R25 

R26 

R27 

R28 

R29 

R30 

R31 

R32 

R33 

R34 

R35 

R36 

R37 

R38 

R39 

R40 

YES 

5 

7 

7 

9 

2 

9 

13 

6 

5 

8 

6 

6 

8 

7 

5 

9 

17 

11 

9 

12 

R60 


R80 


3 


3 
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Fig.  2.  Outlier  detection. 


relation  R 2  >  Q2  and  RMSEC  <  RMSEV.  Fig.  1  depicts  the  procedure 
used  for  analysis  of  the  reports.  All  multivariate  methods  were 
performed  with  the  Pirouette®  package. 

3.  Results 

3  A.  Qualitative  studies 

The  mode  of  suicidal  action  was  evaluated  for  all  reports;  Table  3 
summarizes  the  results.  Hanging  was  the  most  common  mode  of 
action,  followed  by  injury  by  blunt-piercing  instrument,  and  impact 
due  to  fall  from  great  height.  Only  one,  four  and  three  cases  con¬ 
cerned  death  by  drowning,  poisoning  and  injury  by  cutting  and 
piercing  instrument,  respectively.  The  cause  of  death  was  not 
identified  in  four  cases.  Table  4  contains  the  answer  YES,  NO  and  ID 
for  each  variable. 


Table  4  shows  most  of  the  variables  (V01,  V02,  V03,  V05,  V06, 
V07,  Vll,  V12,  V13,  V15,  V17  and  V19)  presented  YES  answers. 
However,  none  of  the  variables  obtained  100%  YES  answers, 
demonstrating  that  no  variable  was  found  as  positive  for  all  the 
reports.  See  the  complete  results  Table  in  Supplementary 
Information  (Tables  1.1 -1.4). 

Table  5  lists  the  total  number  of  YES,  NO  and  ID  answers  in  each 
report.  Pareto's  Diagrams  for  each  block  of  twenty  reports  are 
shown  in  the  Supplementary  Information  (Figs.  1—4).  The  results 
reveal  that  only  two  reports  (R06  and  R09)  had  100%  positive  as¬ 
pects,  i.e.,  they  contained  only  YES  answers,  corresponding  to  2.5% 
of  the  total  of  number  of  reports  analyzed  herein.  Twenty  reports 
(RIO,  R13,  R14,  R16,  R17,  R18,  R19,  R21,  R22,  R23,  R24,  R25,  R28,  R29, 
R31,  R32,  R33,  R34,  R35  and  R39)  obtained  NO  for  most  of  the  an¬ 
swers,  indicating  that  these  reports  had  more  negative  than  posi¬ 
tive  aspects.  These  reports  accounted  for  25%  of  the  entire 
evaluated  set.  Finally,  no  ID  answers  existed  in  only  twenty-two 
reports  (R01,  R03,  R06,  R07,  R08,  R09,  RIO,  R20,  R24,  R29,  R37, 
R39,  R42,  R43,  R44,  R45,  R46,  R63,  R66,  R69,  R74  and  R75), 
demonstrating  that,  for  most  of  the  reports  (around  72.5%),  it  was 
not  possible  to  determine  at  least  one  variable. 

3.2.  Qualitative  studies 

We  solved  Equation  (1)  for  the  collected  reports;  Table  6  pre¬ 
sents  the  Reports  Relevance.  Based  on  the  results,  the  reports  were 
classified  into  two  classes: 

•  Class  1.  RR  values  from  0.50  to  1.00  (RR  >  0.50):  Reports  contain 
more  than  the  average  amount  of  required  information  (black). 

•  Class  2.  RR  values  from  0  to  0.50  (RR  <  0.50):  Reports  contain 
less  than  the  average  amount  of  required  information  (red/grey/ 
bold). 

To  try  to  validate  the  methodology  suggested  by  Equation  (1), 
we  applied  unsupervised  learning  techniques  such  as  PCA  and  HCA 
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Fig.  3.  Scores  and  loadings  set  for  the  three  first  principal  components  (Factor  1  x  Factor  2  x  Factor  3). 
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1  0  08  06  04 


Fig.  4.  HCA  results  for  raw  data  matrix:  (a)  Samples  clustering:  (b)  variables  clustering:  (c)  zoom  on  the  cluster  with  RR  >  0.50,  numbers  of  each  branch,  from  the  top  to  the  bottom 
are:  R80,  R75,  R45,  R46,  R74,  R69.  R09.  R06,  R07,  R01,  R66,  R63,  R44,  R42,  R37,  R08,  R03,  R79.  R77,  R73,  R53,  R52,  R38,  R47,  R78,  R61,  R54,  R40,  R50,  R48,  R70,  R41,  R49,  R43,  R56,  R04, 


R65,  R62,  R59,  R27,  R02,  which  correspond  to  the  black  reports  in  Table  6;  (d)  zoom  on  the  cluster  with  RR  <  0.50  (misclassified  samples  are  highlighted),  numbers  of  each  branch, 
from  the  top  to  the  bottom  are:  R72  R71  R10  R32  R22  R13  R36  R33  R18  R15  R24  R20  R35  R13  R14  R34  R31  R24  R21  R39  R29  R19  R16  R68  R58  R12  R28  R26  R67  R60  R55  R30  Rll  R05, 
which  correspond  to  the  red/grey/bold  reports  in  Table  6. 


to  the  data.  We  also  conducted  classification  by  supervised  learning 
SIMCA  and  KNN. 

3.3.  PCA  results 

We  used  PCA  to  validate  the  RR  empirical  equation.  Firstly,  we 
performed  PCA  on  the  raw  data  matrix,  i.e.,  the  tri-valued  matrix 
with  1, 0  and  - 1  data.  We  used  the  RR  values  to  classify  samples  and 
employed  this  classification  in  comparisons  with  natural  clustering. 


We  previously  evaluated  outliers  by  comparing  sample  residual 
with  Mahalanobis  Distance  (Fig.  2).  We  removed  five  reports  (R25, 
R76,  R51,  R57  and  R64)  from  the  data  set,  because  they  presented 
outlier  behavior;  this  removal  improved  the  PCA  analysis.  Fig.  3 
depicts  the  3-D  scores-loadings  pair  of  results  from  PCA  for  raw 
data  matrix.  Black  and  red/grey  samples  correspond  to  Class  1  and  2 
respectively.  Three  principal  components  were  able  to  separate  the 
two  classes  and  accounted  for  around  81%  of  all  the  information. 
Loadings  (Fig.  3(b))  indicated  that  V10  was  far  from  the  other 
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variables  and  contributed  to  separating  reports  with  RR  <  0.50.  V10 
corresponded  to  the  bloodstain  analysis.  Checking  results  for 
qualitative  analysis,  it  is  possible  to  note  that  most  of  the  reports 
did  not  account  for  this  evaluation.  PCA  loadings  show  that  this 
variable  V10  is  really  important  for  suicide  analysis;  its  absence 
strongly  influences  reports  with  small  RR  values. 

3.4.  HCA  results 

We  also  used  HCA  to  check  clustering  by  the  incremental 
method.  Fig.  4(a)  shows  that  two  distinct  groups  can  be  identified: 
all  the  samples  in  the  upper  group  showed  RR  >  0.50  while,  in  the 
lower  group,  most  of  the  samples  displayed  RR  <  0.50.  The  variables 
V04,  V05,  V08,  V09,  V10,  V14,  V16  and  V18  ranked  the  lowest  with 
the  relevant  reports  (RR  <  0.50).  The  variable  VI 6  referred  to 
isolation  and  preservation  of  the  crime  scene,  which  are  the  first 
and  fundamental  requirements  for  successful  technical  and  scien¬ 
tific  research.  The  goal  is  to  ensure  that  no  evidence  is  inserted  or 
withdrawn  from  the  scene,  which  would  change  the  interpretation 
of  the  facts. 

An  important  element  of  documentation  that  must  be  present  in 
the  report  is  the  sketch  of  the  scene  (V18),  which  gives  a  layout  of 
the  crime  location,  with  evidence  placed  in  the  correct  positions. 
The  sketch  complements  the  narration  and  photographs,  and  aids 
the  understanding  of  the  report. 

When  seeking  a  differential  diagnosis  between  suicide  and  ho¬ 
micide,  it  is  crucial  to  analyze  the  clothes  (V08)  and  the  instrument 
of  violent  action  (V05).  The  garments  on  the  body  frequently  pro¬ 
vide  important  information  and  help  to  unravel  the  dynamics  of  the 
facts,  such  as  identifying  or  excluding  body  fight  before  death.  As 
for  instruments,  many  can  be  used  as  weapons.  The  expert  must 
collect  (V04)  for  further  analysis  (V14),  to  determine  the  relation¬ 
ship  between  the  injury  and  the  instrument. 

The  variable  analysis  of  bloodstains  (V10)  identified  those  re¬ 
ports  that  were  considered  unsatisfactory.  The  presence  of 
bloodstains  (V09)  and  their  analysis  allow  the  expert  to  evaluate 
the  displacement  of  the  victim  at  the  scene,  the  intensity  of  the 
trauma  and  a  possible  aggressive  position,  among  other 
inferences. 


Both  unsupervised  techniques  were  able  to  provide  a  specific 
behavior  for  the  studied  classes.  HCA  had  six  misclassified  samples, 
which  corresponded  to  8%  of  the  whole  set  of  reports.  The  variables 
accounting  for  report  classification  as  unsatisfactory  reports  were 
the  same:  V04,  V05,  V08,  V09,  V10,  V14,  V16  and  V18,  agreeing  with 
data  from  Table  4.  Most  of  the  reports  lacked  these  variables:  a 
higher  number  of  reports  contained  NO  and  ID  responses  to  these 
variables  as  compared  with  the  number  of  reports  with  YES 
answers. 

A  detailed  examination  of  class  reports  with  RR  <  0.50  showed 
that  seven  samples  were  misclassified  (highlighted  in  Fig.  4(d),  R05, 
R12,  R55,  R58,  R60,  R67  and  R68).  These  misclassifications  can  be 
understood  using  Tables  1.1  to  1.4  (Supplementary  Information), 

which  indicate  that  these  reports  lacked  a  positive  answer  for  the 
variables  that  led  to  RR  <  0.50. 

3.5.  KNN  results 

As  for  KNN  analysis,  we  did  not  use  any  preprocessing  and  we 
tested  a  maximum  of  neighbors.  Fig.  5  shows  the  number  of  misses 
versus  the  neighbors.  For  three  neighbors,  we  achieved  optimal 
classification  with  0  misses  and  a  0.95  probability  threshold.  If  KNN 
is  hard  modeling,  this  result  can  be  considered  as  a  good  classifi¬ 
cation  -  an  optimal  number  of  neighbors  can  classify  reports 
without  errors. 

3.6.  SIMCA  results 

For  SIMCA  we  selected  four  Principal  Components  as  being  ideal 
for  modeling  with  a  0.95  Probability  Threshold.  Figs.  6  and  7  bring 
scores  for  Class  1  and  Class  2,  respectively.  R05  is  far  from  the  other 
samples,  followed  by  R02,  R12,  R55,  R58,  R60,  R67  and  R68. 
Although  these  reports  presented  RR  over  0.51,  the  red/grey  vari¬ 
ables  influenced  them  (Fig.  8,  Loadings).  This  led  to  their  classifi¬ 
cation  as  Class  1,  agreeing  with  PCA  and  HCA  analyses.  In  the  case  of 
this  class,  four  principal  components  account  for  91%  of  the  infor¬ 
mation.  For  Class  2,  four  principal  components  account  for  87%  of 
the  entire  information. 


Table  6 

Score  obtained  for  each  report  from  Equation  (1). 


R01 

0.81 

R21 

0.16 

R41 

0.78 

R61 

0.82 

R02 

0.69 

R22 

0.20 

R42 

0.91 

R62 

0.84 

R03 

0.90 

R23 

0.23 

R43 

0.71 

R63 

0.81 

R04 

0.94 

R24 

0.30 

R44 

0.95 

R64 

0.82 

R05 

0.59 

R25 

0.06 

R45 

0.75 

R65 

0.72 

R06 

1.00 

R26 

0.44 

R46 

0.60 

R66 

0.76 

R07 

0.75 

R27 

0.66 

R47 

0.65 

R67 

0.72 

R08 

0.91 

R28 

0.43 

R48 

0.84 

R68 

0.91 

R09 

1.00 

R29 

0.18 

R49 

0.84 

R69 

0.89 

RIO 

0.30 

R30 

0.38 

R50 

0.82 

R70 

0.78 

Rll 

0.4 

R31 

0.21 

R51 

0.82 

R71 

0.22 

R12 

0.71 

R32 

0.15 

R52 

0.66 

R72 

0.33 

R13 

0.1 

R33 

0.32 

R53 

0.66 

R73 

0.66 

R14 

0.36 

R34 

0.24 

R54 

0.77 

R74 

0.70 

R15 

0.48 

R35 

0.16 

R55 

0.80 

R75 

0.76 

R16 

0.19 

R36 

0.37 

R56 

0.81 

R76 

0.79 

R17 

0.20 

R37 

0.91 

R57 

0.82 

R77 

0.66 

R18 

0.25 

R38 

0.66 

R58 

0.73 

R78 

0.71 

R19 

0.27 

R39 

0.35 

R59 

0.76 

R79 

0.66 

R20 

0.28 

R40 

0.77 

R60 

0.52 

R80 

1.00 
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Fig.  5.  Number  of  misses  versus  number  of  neighbors.  Fig.  7.  Scores  for  class  2:  RR  <  0.50. 


The  distance  between  classes  showed  that  the  value  was  around 
0.73  for  the  two  assigned  classes,  i.e.,  higher  than  the  cutoff  used  to 
define  the  classes  (0.50).  The  classes'  residuals  were  higher  when 
we  fitted  one  class  against  the  other  one  ( Table  7),  indicating  good 
discriminating  power. 

The  SIMCA  modeling  accounted  for  two  misclassifications.  R02, 
initially  assigned  as  Class  1,  was  reclassified  by  SIMCA  as  Class  2. 
The  opposite  happened  in  the  case  of  R39.  In  fact,  for  R02,  most  of 
the  variables  that  led  to  RR  <  0.50  had  NO  or  ID  answers.  On  the 
other  hand,  for  R39,  although  it  had  RR  <  0.50,  these  variables  are 
mostly  YES  responses.  The  overall  misclassification  (2  reports  in  75) 
represented  less  than  3%  of  the  total  number  of  reports.  According 
to  the  literature,  if  a  model  classifies  95%  of  the  samples  correctly 
and  if  no  class  contains  more  than  one  misclassification,  the  results 
are  highly  acceptable. 

3.7.  PLS  results 

We  analyzed  the  raw  data  matrix  against  the  RR  score  as  a 
dependent  variable.  The  main  goal  was  to  conduct  a  multiple 
regression  with  the  variables,  to  examine  how  useful  they  are  in  the 
composition  of  RR  values.  In  this  case,  it  is  not  helpful  to  select 
variables;  the  goal  is  to  verify  which  variables  contribute  the  most 
to  a  satisfactory  report. 


Fig.  6.  Scores  for  class  1 :  RR  >  0.50. 


We  accomplished  modeling  without  pretreatment;  we  per¬ 
formed  from  Leave  One  Out  (LOO)  up  to  Leave  Seven  Out  (LNO, 
N  =  7)  validations.  The  idea  was  to  test  the  effect  of  removing 
around  10%  of  the  total  number  of  reports.  All  the  models  furnished 
the  same  results:  we  chose  five  principal  components  as  optimal 
and  accounted  for  around  86%  of  the  whole  set  of  information.  Q2 
and  R 2  values  were  0.94  and  0.97,  respectively.  RMSEC  was  lower 
than  RMSEV  in  all  cases.  The  results  confirmed  that  the  values  were 
robust  -  we  obtained  the  same  numbers  for  all  validation  tests. 
Hence,  the  RR  equation  can  give  an  idea  as  to  the  amount  of  in¬ 
formation  in  a  report.  The  regression  is  presented  in  Fig.  9,  which 
displays  the  two  classes  studied  in  very  distinct  regions.  The 
regression  vector  and  the  coefficients  for  each  variable  are  shown  in 
Table  8. 

Two  variables  contributed  the  most  to  the  suitability  of  the  re¬ 
ports:  the  presence  or  absence  of  defense  lesions  in  the  victim 
(V06)  and  the  correlation  between  traces  and  dynamics  of  the  facts 
(V13).  These  variables  were  inter-related  and  were  essential  to  the 
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Table  7 

Interclass  residuals. 

Factorl 

Fig.  8.  Loadings  for  raw  data  matrix. 

Class  1 

Class  2 

Class  1 

0.31 

0.55 

Class  2 

0.49 

0.30 
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Fig.  9.  Calibration  curve  for  PLS  regression. 


differential  diagnosis  in  the  analysis  of  a  suicide  scene,  helping  to 
rule  out  the  hypothesis  of  a  simulation. 

4.  Conclusion 

The  aim  of  this  study  was  to  assess  reports  of  suicide  to  deter¬ 
mine  the  weaknesses  of  the  procedures  used  by  official  expertise. 
We  formulated  19  questions  to  evaluate  the  reports.  These  variables 
could  be  answered  as  YES,  NO  or  ID  (Impossible  to  Determine), 
which  were  assigned  values  1,  -1  or  0,  respectively.  The  variable 
importance  for  the  report  was  associated  with  the  context  of  the 
situation,  which  was  used  in  an  empirical  equation  that  calculated 
the  Report  Relevance  (RR).  We  validated  the  RR  equation  using 
multivariate  techniques  (unsupervised  HCA  and  PCA;  supervised 
KNN  and  SIMCA),  which  discriminated  between  the  classes 
assigned  for  RR  <  0.50  and  RR  >  0.50  with  similar  results.  These 
techniques  indicated  that  it  was  essential  to  analyze  some  variables 
in  this  type  of  crime  scene,  namely  V04,  V05,  V08,  V09,  V10,  V14, 
V16  and  V18. 

PLS  analysis  accounted  for  the  RR  values  against  the  variables. 
LOO  until  LNO  (N  =7)  validation  provided  good  values  for  Q2  and 
R2,  around  0.94  and  0.97  respectively.  Regression  Vector  revealed  a 
major  influence  of  variables  V06  (absence  of  fighting  signals)  and 


Table  8 

Regression  Vector:  coefficients  for  each  variable  to 
compose  RR-vector. 


Variable 

Coefficient 

V01 

-0.046 

V02 

0.051 

V03 

-0.050 

V04 

0.018 

V05 

0.073 

V06 

0.144 

V07 

0.051 

V08 

0.056 

V09 

0.040 

V10 

-0.027 

VI 1 

0.031 

V12 

0.055 

V13 

0.198 

V14 

0.090 

V15 

0.049 

V16 

0.077 

V17 

0.083 

V18 

0.069 

V19 

0.072 

Bold  values  indicate  the  most  important  variables 
to  describe  RR. 


V13  (matches  between  traces  and  dynamics)  on  the  RR  values.  In 
fact,  all  other  variables  affected  these  ones  in  their  evaluation. 

According  to  the  qualitative  results,  the  way  documents  dis¬ 
cussed  the  subject  differed  markedly,  which  can  generate  doubts  in 
the  case  of  litigation.  In  many  cases,  the  conclusion  had  no  clear 
correlation  with  the  information  in  the  set  of  presented  evidence. 
The  qualitative  approach  showed  that  no  variable  was  dispensable 
in  this  study,  and  that  the  Report  Relevance  needs  to  consider  the 
importance  associated  with  the  context  of  the  crime  scene. 

Finally,  we  concluded  that  the  suicide  reports  did  not  follow  the 
any  standardized  methodology.  This  lack  of  reference  procedure 
complicates  the  work  of  government  experts.  This  study  suggests 
that  efforts  toward  the  standardization  of  the  procedures  must  be 
made.  Standardization  will  entail  more  homogeneous  results  for 
analyses,  thereby  reducing  the  possibility  of  alternative  discussions 
in  the  judicial  arena.  Finally,  this  approach  has  a  potential  appli¬ 
cation  in  different  regions  and  countries,  as  well  as  in  different 
forensic  problems,  such  as  drug  reports,  homicide  and  other 
forensic  investigations. 
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